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Abstract 

We present a computationally-efficient method for recovering sparse signals from a series of 
noisy observations, known as the problem of compressed sensing (CS). CS theory requires solving a 
convex constrained minimization problem. We propose to transform this optimization problem into 
a convex feasibility problem (CFP), and solve it using subgradient projection methods, which are 
iterative, fast, robust and convergent schemes for solving CFPs. As opposed to some of the recently- 
introduced CS algorithms, such as Bayesian CS and gradient projections for sparse reconstruction, 
which become prohibitively inefficient as the problem dimension and sparseness degree increase, the 
newly-proposed methods exhibit a marked robustness with respect to these factors. This renders the 
subgradient projection methods highly viable for large-scale compressible scenarios. 

I. Introduction 

Recent studies have shown that sparse signals can be recovered accurately using less observations 
than predicted by the Nyquist/Shannon sampling principle; the resulting theory is known as compressed 
sensing (CS) CQ, El- The essence of the new theory builds upon a new data acquisition formalism, in 
which compression plays a fundamental role. Sparse - and more generally - compressible signals arise 
naturally in many fields of science and engineering, such as the reconstruction of images from under- 
sampled Fourier data, biomedical imaging and astronomy O, (H. Other applications include model- 
reduction methods to enforce sparseness for preventing over-fitting and for reducing computational 
complexity and storage capacities. The reader is referred to Refs. 12 and Q for an extensive overview 
of CS theory. 

The recovery of sparse signals consists of solving an NP-hard minimization problem HI, Q. State- 
of-the-art methods for addressing this optimization problem commonly utilize convex relaxations, 
non-convex local optimization and greedy search mechanisms. Convex relaxations are used in various 
methods such as least absolute shrinkage and selection operator (LASSO) 0, least angle regression 
(LARS) 0, the Dantzig selector (DS) O, basis pursuit (BP) and basis pursuit de-noising (9]|. The 
two former schemes, the LASSO and LARS, are essentially homotopy methods that exploit pivoting 
operations on sub-matrices from the complete sensing matrix (sub-sensing matrices) for yielding a 
solution path to the convex optimization problem. These methods turn out to be extremely efficient 
whenever the sparseness level is relatively high owing to the fact that only a few sub-sensing matrices 
need to be provided, corresponding to the instantaneous support of the underlying reconstructed 
signal. The latter methods, the DS and the BP variants, recast a linear program and employ either 
simplex or interior-point techniques for obtaining an optimal solution. Similarly to the homotopy-based 
approaches, these methods may become computationally-intensive, since the number of elements in 
the support of the underlying signal increases. 
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Non-convex optimization approaches include Bayesian methodologies such as the relevance vector 
machine (RVM), otherwise known as sparse Bayesian learning ifTOll as well as stochastic search 
algorithms, which are mainly based on Markov chain Monte Carlo (MCMC) techniques lfTT1 - lfl4l . In 
virtue of their Bayesian mechanism and in contrast to other optimization approaches, these methods 
provide a complete statistical solution to the problem by means of a probability density function. 
Nevertheless, the intensive computational requirements of these methods render their applicability 
questionable in high-dimensional problems. 

Recently, the Bayesian framework was utilized to create efficient CS schemes |[T5l . ifToll . The 
Bayesian CS algorithm in [01 exploits both a sparseness-promoting hierarchical prior and an RVM 
mechanism for deriving point estimates and statistical error bounds. This method was shown to 
outperform some of the commonly-used greedy schemes both in accuracy and speed. The work in 
lfT6l derived a pseudo-measurement-based Kalman filtering algorithm (CSKF) for recovering sparse 
signals from noisy observations in a sequential manner. This approach extends CS to accommodate 
stochastic linear filtering problems and, similarly to the aforementioned Bayesian methods, yields a 
complete statistical solution to the problem (a Gaussian distribution). 

Notable greedy search algorithms are the matching pursuit (MP) ifTTll . the orthogonal MP lfT8l . 
and the orthogonal least squares (OLS) |[T9l . Both MP and OMP minimize the reconstruction error 
by iteratively choosing elements from the dictionary matrix (sensing matrix). The OMP involves an 
additional orthogonalization stage and is known to outperform the conventional MP. The OLS works in 
a similar fashion, employing an orthogonal transformation of the original sensing matrix. The greedy 
algorithms are known to posses an advantage in terms of computational cost over other optimization 
schemes when the basis projections are sufficiently incoherent and the number of elements in the 
support is relatively small. This, in turn, implies that their performance may deteriorate dramatically 
in common realistic settings wherein the underlying signals are compressible rather than sparse. 

The scalability of a CS technique essentially refers to its viability for high dimensional settings. 
This issue is given a prime consideration in 1201 . where a new gradient projection (GP)-based 
method is proposed for solving possibly large-scale CS problems. As pointed out in ET31 . large-scale 
methods require only matrix-vector products involving the sensing matrix. As demonstrated in EUl . 
the GP algorithm is efficient in high-dimensional settings compared to the aforementioned methods. 
However, similarly to the homotopy methods, the practical implementation of the GP algorithm 
requires a delicate tuning of the Zi-norm bounding parameter, which greatly affects its convergence 
and optimality. 
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A. Restricted Isometry Property and Convex Relaxations 

The theory of compressed sensing has drawn much attention to the convex relaxation methods. It 
has been shown that the convex l\ relaxation yields an exact solution to the recovery problem provided 
two conditions are met: 1) the signal is sufficiently sparse, and 2) the sensing matrix obeys the so- 
called restricted isometry property (RIP) at a certain level. Another complementary result ensures 
high accuracy when dealing with noisy observations. Further elaboration of this result facilitated its 
probabilistic version, which is concluded by the known statement of recovery 'with overwhelming 
probability' HI. To put it informally, it is highly probable for the convex l\ relaxation to yield an exact 
solution provided the involved quantities, the sparseness degree s, and the sensing matrix dimensions 
m x n maintain a relation of the type s = 0{m/ log (n/m)). 

B. Convex Feasibility Problems, Subgradient Projections, and Compressed Sensing 

The issue of computational efficiency is of prime importance for large-scale CS problems. Ideally, a 
computational scheme for recovering sparse and/or compressible signals should be characterized by (i) 
computational efficiency; (ii) minimum mean-square error of the estimated signals; and (iii) scalability. 
Additional desirable features would be simple and transplant implementation (no "black boxes"); 
robustness with respect to some sparsensess/compressibility measure; and ubiquitous applicability to 
real- world problems. 

A potentially promising approach for satisfying the aforementioned goals is to transform the CS 
problem into a convex feasibility problem (CFP). CFPs are aimed at finding a point in the intersection 
set of nonlinear convex sets. The CFP is a fundamental problem in numerous fields, see, e. g., 11211 . 
lT22l and references therein. It has been used to model real-world problems in image reconstruction 
from projections E3l . in radiation therapy treatment planning E4l . and in crystallography 1231 . 

Most often, CFPs are solved by performing projections onto the individual sets. This is carried 
out in various ways, by using different projection methods, resulting in a myriad of algorithms that 
usually exhibit good convergence while consuming minimal computational resources. Some of the 
projection algorithms may even provide predictable performance when the solution set of the CFP is 
empty ||26l . We refer the reader to Ref. 11271 for a discussion on the connection between projection 
methods and variational inequalities; to Refs. 11281 . ||29l for applications in signal processing; and to 
lf30l Chapter 5] for a comprehensive review. 

However, projections onto convex sets may be prohibitively difficult when the constraints are 
nonlinear. This is so because computing orthogonal projections onto arbitrary convex sets requires 
a separate, inner-loop minimization for finding the minimum distance between a given point and an 
arbitrary curve, a process that usually involves a considerable computational effort. 
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An elegant alternative is to use subgradient projections. Subgradient projections only require the 
instantaneous subgradient in order to perform the next iteration. The projection is performed onto an 
intermediate point, and not directly onto the convex set. Some of the subgradient projection methods, 
such as cyclic subgradient projections OTI have a convergence proof when the intersection set of all 
the convex constraints is non-empty. 

In this paper, we use both cyclic and simultaneous subgradient projections (CSP and SSP) for 
efficiently solving the CS problem. The proposed algorithms are characterized by (i) computational 
efficiency; (ii) accuracy; (iii) robustness to varying compressibility and sparseness levels; and (iv) 
straightforward, transparent implementation. By using extensive numerical evaluations, we illustrate 
the superiority of the proposed scheme - in terms of computational efficiency and accuracy - compared 
to commonly-used methods for solving the CS problem. 

II. Sparse Signal Recovery and Compressed Sensing 

Consider a signal \ £ that is sparse in some domain, i.e., it can be represented using a relatively 
small number of projections in some known, possibly orthonormal, basis, say tp G M nx ' 1 . Thus, we 
may write 

n 

X = ip x = 'y] Xjipj = ^2 x 3^v \\x\\Q<n (1) 

t=l x-,-Gsupp(z) 

where supp(x) and ||x||o (the Iq norm) denote the support of x and its dimension (i. e., the number of 
non-zero elements of x), respectively. The problem of compressed sensing considers the recovery of x 
(and therefore of x) from a limited number, m < n, of incoherent and possibly noisy measurements (or, 
in other words, sensing a compressible signal from a limited number of incoherent measurement) ID. 
The measurements themselves obey a linear relation of the form 

y = H' x = Hx (2) 

where H G ]^ mxn an d }{ — H'ip. In many practical applications, the observation vector y may be 
either inaccurate or contaminated by noise. In this case, which will be referred to as the stochastic 
CS problem, an additional noise term is added to the right-hand side of (O. 

In general, under certain limitations on the sparseness degree of x, s = \\x\\o, an exact solution to 
the above recovery problem can be obtained by essentially solving a subset selection problem of the 
form 

min ||5||o s.t. \\y — HxW^ < e (3) 

X 

for a sufficiently small e. However, the problem ([3]) is known to be NP-hard, which implies that in 
practice an optimizer £ cannot be computed efficiently. 
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In the late 90's, the l\ norm was suggested as a sparseness-promoting term in the seminal work 
introducing the acclaimed LASSO operator and the basis pursuit (9). Recasting the sparse recovery 
problem using the l\ norm yields a convex relaxation of the original NP-hard problem, which can be 
efficiently solved via a myriad of well-established optimization techniques. Commonly, there are two 
equivalent convex formulations that follow from ([3]): The quadratically-constrained linear program, 
which takes the form 

min ||5||i s.t. \\y — Hx\\2 < e (4) 

X 

and the quadratic program 

min \\y — Hx\\2 s.t. ||a||i < e' (5) 

X 

It can be shown that for proper values of the tuning parameters e and e' the solution of both these 
problems coincide. 

Recently, (J, (3 have shown that an accurate solution of (0) can almost always be obtained by 
solving the convex relaxation (0]) assuming that the sensing matrix H obeys the RIP mentioned before. 
The RIP roughly implies that the columns of a given matrix nearly behave like an orthonormal basis. 
This desired property is possessed by several random constructions, which guarantee the uniqueness 
of the sparse solution. In particular, an exact recovery is highly probable when using such matrices 
provided that a relation of the type 

s = 0{m/ log(n/m)) (6) 

holds. For an extensive overview of several RIP constructions and their role in CS, the reader is 
referred to ID, 13 • 

In practice, the projected signal x may be nearly sparse, in the sense of having many relatively 
small elements, which are not identically zero. Such representations, frequently encountered in real- 
world applications, are termed compressible. Most of the results in the CS literature naturally extend 
to the compressible case assuming some behavior of the small nonzero elements. Such a behavior 
is suggested in 0, where the compressible element sequence is assumed to decay according to the 
power law 

|scj| < Ki~ 1/r , \xi\ > (7) 

where k > and r > are the radius of a weak l r ball to which x is confined, and a decay factor, 
respectively. In this case, an equivalent measure of the signal sparseness degree, s, can be obtained 

as 

s = n — card{i | \xi\ < e} (8) 
for some sufficiently small e > 0, where card{-} denotes the cardinality of a set. 



DRAFT 



7 

III. Sub gradient Projections for Compressed Sensing 

In this section, we outline the convex feasibility problem and two algorithms for a solution thereof: 
The cyclic subgradient projections (CSP) and the simultaneous subgradient projections (SSP). We will 
explain how these algorithms and their variants, collectively referred to as convex feasibility methods, 
are implemented for efficiently solving the convex CS problem described by Eqs. (@]) or ([5]). 

A. The Convex Feasibility Problem 

Given p closed convex subsets Qi,Q?, - • • ,Q P C M n of the n-dimensional Euclidean space, 
expressed as 

Qi = {z G R n | fi(z) < 0} , (9) 

where fa : W 1 — > R is a convex function, the convex feasibility problem (CFP) is 

find a point z* G Q := D p i=1 Q l . (10) 

If Q 0, the CFP is said to be consistent. Consequently, it is required to solve the system of convex 
inequalities 

/<(*)< 0, i = l,2,...,p. (11) 

The context of convex inequalities gives rise to the realm of subdifferential calculus, in which the 
definitions of subgradients and subdifferentials play a fundamental role. Given the convex function 
fi : R n — > R, a vector t G R n is called a subgradient of fa at point zq if 

fi(z)-fi(z )>(t,z-z ) (12) 

The subdifferential of fi at zq, denoted by dfi(zo), is the non-empty compact convex set 

dfi(z ) := {t : fi(z) - fi(z ) >(t,z- z ), Vz} (13) 

When fi is differntiable at zq, the subgradient becomes a gradient, t = Vfi(zo). 

B. Subgradient Projections 

Subgradient projections have been incorporated in iterative algorithms for the solution of CFPs. 
They can be roughly categorized into two main categories, sequential and simultaneous. The cyclic 
subgradient projections (CSP) method for the CFP, which is a sequential subgradient projections 
algorithm, was developed by Censor and Lent 11311 and is summarized in Algorithm Q] 

A convergence result for the CSP method in the consistent case was provided in Il30l Chapter 5], 
where it was shown that if the functions fi(z) are continuous and convex on IR n Vi; Q := (~f i=l Qi ^ 0; 
and the subgradient is uniformly bounded, then any sequence {z k } produced by Algorithm Q] converges 
to a solution of the CFP, i. e., z k — > z* as k — > oo. The convergence proof in (30l Chapter 5] is based 
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Algorithm 1 The method of cyclic subgradient projections (CSP) 



Initialization: z° G M n is arbitrary. 

Iterative step: Given z k , calculate the next iterate z k+l by 



^-p-t fc , if f m {z k ) > 0, 
II2 

if f m (z k ) < 0, 



(14) 



where t k G dfu^z ) is a subgradient of f^) at the point z k , and the relaxation parameters {a k } 



k\oo 



k=0 



are confined to an interval e\ < a k < 2 — e 2 , for all k > 0, with some, arbitrarily small, e\, e 2 > 0. 
Constraint-Index Control: The sequence {«(A;)}^ =0 is cyclic, that is, i(k) = k( mod p) + 1 for all 

k > 0. 



on the concept of Fejer monotonicity: A sequence {z k } k x L is Fejer monotone with respect to some 
fixed set Q C JR n , if Vz G Q, 

||z fc+1 -z|| 2 < \\z k -z\\ 2 , V£;>0 (15) 

Sequential projection methods for solving CFPs usually have simultaneous counterparts. The si- 
multaneous subgradient projections (SSP) method IT321 . Il33l is a simultaneous variant of the CSP, and 
is given in Algorithm [2] 

Algorithm 2 The method of simultaneous subgradient projections (SSP) 
Initialization: z° G W 1 is arbitrary. 

Iterative step: 

i. Given z k , calculate, for all i G / = {1, 2, . . . ,p}, intermediate iterates £ fc+1 > ! by 



zk ~ a k^At k , if >0 



nr (i6) 

z k , if / 4 (z fc ) < 0, 



where t k G dfi(z h ) is a subgradient of at the point z k , and the relaxation parameters {afc}£^l 
are confined to an interval e\ < au < 2 — e 2 , for all > 0, with some, arbitrarily small, ei, e 2 > 0. 
ii. Calculate the next iterate z k+1 by 

p 

z k+i = J2w i ( k + 1 > i (17) 

i=l 

where wi are fixed, user-chosen, positive weights with Yli=i w i = 1- 
The convergence analysis for this algorithm is available only for consistent (Q 7^ 0) CFPs, see IT321 . 

Bl. 
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C. CSP and SSP for Compressed Sensing 

Based on Eqs. (@]) and ©, we may formulate the following CS problem: Given a real-valued mxn 
matrix H, a measurement vector y S R m and a sparse (or compressible) vector x £ R n , 

find x* EM" s. t. y = Hx*, \\x*\\i < e (18) 

Eq. (fT8l ) can be translated into the language of CFP. To do this, we define m closed convex sets 
Qli Q21 • • • > Qm f= M n expressed as 

Q i = {xGR n \y i = {h i , x)}, (19) 

where hi indicates the z-th row of the matrix H. In this case, at a given iteration k, t k = V(/ij, x) = hi. 
An additional closed convex set Q m +i C M. n is defined as 

Q m+1 = {x G M n I IMIi - e < 0} (20) 

For the latter convex set, we note that a subgradient at the origin can be chosen so that 

t k G a||x||i U =0 = llxn (21) 

where li X n is an n-dimensional vector of unit entries. It can be easily verified that the subgradient 
chosen in Eq. (1211) satisfies the subgradient definition of Eq. (fT2l) . This subgradient may be generalized 
\/x S W 1 by adopting the convention 




sign(xj) = < 
and writing, at some iteration k, 

t k = sign(x k ), ||t fc ||2 = n , Vfc. (23) 

The next stage is to implement the CSP algorithm in order to find a feasible solution x* G Q := 
Qi for the CFP formulation of the CS problem (TT8l) . Following the recipe of Algorithm [T] using 
the relationships (1211 ). (1221) and (1231 . such an algorithm is formulated as follows: 

An application of Algorithm [2] for the CS problem (fT8l ). termed SSP-CS, can be done in a similar 
manner to Algorithm |3j and is omitted here for the sake of conciseness. 

Note that the inactive step of Algorithms Q] and |2j i. e., when the current iterate remains unchanged, 
is unnecessary if the constraints are linear equations (cf. 11301 Chapter 6]). This fact helps increasing 
the computational efficiency. In the same context, it is worth noting that without using the 1-norm 
constraint implemented in Eq. (1231 ). the recovery of x from the linear set of equations y = Hx 
is possible, under certain regularity conditions, using the well-known algorithm of Kaczmarz ll30l 
Chapter 6]. In this case, the obtained solution x* would minimize \\x\\2- However, adding the 1-norm 
constraint is key to a successful recovery when x is sparse (or compressible). 
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Algorithm 3 Cyclic subgradient projections for compressed sensing (CSP-CS) 



TO 



Initialization: x £ W l is arbitrary. 

Constraint-Index Control: Set i(k) = k( mod p) + 1 for all k > 0, where p = m + 1. 
Iterative step: For 1 < i < m, calculate x k+1 by 




(24) 



with the relaxation parameters {a k }^ =0 confined to the open interval (0, 2). 
For i = p = m + 1, calculate x k+1 by 



= x k - A 



A: 



1 



— e 



sign(x fc ) H[||f fe ||i-e] 



(25) 



where T-L[x] denotes the standard step function (i.e., T~L[x] = 1 for x > and 1-L[x\ < otherwise), 
and {A fc }£jl are confined to the open interval (0, 2). 



The convergence of Algorithms [3] and its SSP variant to some feasible solution x* is guaranteed, 
provided that the CFP is consistent (i. e., the solution set is non-empty). However, for real-world, large- 
scale problems, one cannot determine a-priori whether a given value of e renders the CFP consistent; 
this is essentially a matter of tuning. The behavior of the subgradient projections algorithms for the 
inconsistent case was investigated in Ref. 1261 . where it was shown that simultaneous subgradient 
projections methods usually yield better convergence in the inconsistent case. However, simultaneous 
projection methods tend to converge slower than sequential methods; it is thus a good idea to combine 
these two methods in order to fight instabilities and obtain fast convergence. This issue is discussed 
in the next section. 

IV. Practical Implementation 

In this section we discuss several issues concerning the practical implementation of the CSP-CS 
and SSP-CS algorithms. 

A. Stopping Criteria 

The CSP-CS and SSP-CS routines are terminated when either some predetermined maximal number 
of iterations is exceeded, or when there is no significant difference between two consecutive estimates, 
viz. || x k+1 — x k || 2 < 7 for some small 7 > 0. 

B. Refinements 

As mentioned in the previous section, in some cases the set of constraints may not be consistent 
due to an improper setting of e, which in turn implies that the convergence of the CSP method is not 
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guaranteed. Typically, in such scenarios the estimated parameters persistently fluctuate around some 
nominal value. The reconstructed signal itself can be highly accurate; however, its entries do not attain 
a fixed value. This problem can be alleviated by incorporating an additional refinement stage. In this 
work, we propose two such refinement approaches, each of which relies on a different technique for 
producing refined estimates. 

1 ) Gauss-CSP: After a given number of CSP-CS iterations, the support of the unknown signal can 
be readily approximated based on the magnitudes of the estimated elements. As soon as the support 
is given, the signal can be accurately reconstructed following a simple least squares (LS) procedure. 
This approach is inspired by the strategy adopted in (H for correcting the bias inherent to the Dantzig 
selector (the corrected scheme was referred to in JH as the Gauss-Dantizg selector). This technique, 
applied to the CSP-CS algorithm, will be referred to as the Gauss-CSP . This variant of CSP-CS is 
summerized in Algorithm 01 

Algorithm 4 Gauss-CSP 

CSP-CS Stage: Perform several CSP-CS iterations based on Algorithm [3] until a stopping criteria is 
reached. 

LS Stage: Set N < m as the maximal support size of the obtained estimate x k . Define a set of indices 
G := {ji}f = i of the most significant entries in x k in terms of magnitude. Solve 

P=(H T tiy 1 H T y (26) 

where H G ]g mxAr [ s composed of the columns of H corresponding to the indices in G. Set x k = 0, 

VZ £ G and x k u = ft, i = 1, . . . , N. 



A slightly different implementation of the Gauss-CSP, referred to as alternating Guass-CSP, can 
dramatically improve the convergence properties of the CSP method. This consists of incorporating 
the LS stage directly into the CSP algorithm in an alternating manner. This variant of the Gauss-CSP 
typically converges much faster than any other alternative. Following this approach, the LS routine is 
executed whenever the cyclic index i in Algorithm [3] reaches an arbitrary predetermined value. 

2) CSP-SSP: Normally, running a few SSP iterations subsequent to the termination of the CSP 
routine improves the accuracy of recovery. This stems from the improved behavior of simultaneous 
subgradient projections method in the inconsistent case, discussed in the previous section. 

C. Block Processing 

In some programming environments, it might be more computationally efficient to process a group 
of / observations t/i = (hi,x), i = j, . . . , j + I at each iteration rather than a single one. Our own 
implementation of the CSP-CS exploits this simple idea for alleviating the workload on the MATLAB® 
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interpreter, which may become prohibitively slow when loops are involved. This approach does not 
require any considerable modification of the original algorithm. In practice, this is accomplished by 
running the cyclic index from 1 to p/l + 1 while repeating the CSP update stage d24l ) I times at each 
iteration (one update for each observation in the block). 

V. Illustrative Examples 

In this section, the new convex feasibility programming algorithms are assessed based on an 
extensive comparison with some of the commonly-used methods for CS. The algorithms considered 
herein consist of the homotopy method LARS 0, the two greedy algorithms, OMP lfl"8l and BP l9"1, 
the recently-introduced gradient projection (GP)-based method of EOl . and the Bayesian CS (BCS) of 
lfl"5l (the links for the MATLAB® implementations of the various methods used here are provided in the 
Appendix). In order to highlight the weaknesses and virtues of the various methods, we examine here 
both synthetic and realistic scenarios, associated with the two signal types frequently encountered in CS 
applications: Sparse and compressible (nearly sparse in the sense of ([8])). An indicator for the difficulty 
of recovery is assigned for each problem based on its unique settings (dimension and sparseness 
degree). This measure, termed here the recovery index, is derived from the relation m > c(s log n) CD 
(for some c > 0) as 

{s/m)\ogn (27) 

This index essentially refers to the probability of recovery assuming the sensing matrix obeys the RIP 
up to a certain sparseness degree. As it was already pointed out in HI, as this measure increases an 
exact recovery becomes less probable. 

Throughout this section, the signal reconstruction error is computed as 



/ \\ X — X 9 

e: =\/^) <28) 

where the normalizing term d{x) is set according to the signal type, 

{V™ i mm(x 2 , a 2 ), for sparse x 
z^=i v J, (29) 
|| x || 2, for compressible x 

with a being the standard deviation of the observation noise. The above formulation consolidates both 

the ideal and the normalized recovery measures that are used for assessing the reconstruction accuracy 

in 00 and ITT31 . respectively. The mean reconstruction error is computed in a similar fashion over N 



runs via -yA/V -1 ^ e 2 (i), where e{i) denotes the error in the ith run. 

A. Least Squares Augmentation 

The approach underlying the Gauss-Dantzig (H and the Gauss-CSP in Algorithm [4] which es- 
sentially consists of incorporating an additional LS stage for refining the obtained estimates, can be 
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easily applied for any other CS method. This poses a question as to what should be a fair comparison 
when considering the Gauss-CSP variant. Intuitively, one could think that the relative improvement 
of the LS stage as described in Algorithm [4] would be identical irrespectively of the CS method, 
and therefore such variants should not be compared at all. This statement, however, turns out to be 
incorrect, as shown by our experiments. In order to maintain our observations as equitable as possible 
in these circumstances, our comparison involves, in addition, an LS-augmented variant of each of the 
aforementioned CS techniques. 

B. Synthetic Example 

In the first scenario, the various methods are applied for the recovery of a sparse signal from 
noisy observations. The sensing matrix H E fl£ mxn use d here consists of normalized random entries 
sampled from a zero-mean Gaussian distribution with a standard deviation of Xj^fra. This type of 
random construction has been shown to obey the RIP up to a reasonable sparseness level (see 0). 
We examine the recovery performance of the CS algorithms in various settings consisting of different 
problem dimensions, ranging from 512x1024 (small) through 1024x2048 and 2048x4096 (medium) 
to 3072x6144 (large), and different sparseness levels. The original signal x is composed of only few 
nonzero random elements, which are uniformly sampled over [—1,1], and of which the indices are 
randomly picked between 1 and n. In all runs, the measurement noise standard deviation is set as 0.01. 
The various algorithms' tuning parameters are taken as those that seemed to minimize the recovery 
error based on tuning runs. The CSP-CS relaxation parameters are set as 



which seemed to be the best values with respect to accuracy and convergence time. The CSP routine 
is terminated when either the number of iterations exceeds 5000 or when the normed difference 
between two consecutive estimates, || Xk+i — ifc lh> drops below some 7 where 7 = 0.01 for the low 
dimensional problem and 7 = 0.5 otherwise. 

Both the OMP and the LS refinement stage assume a maximal support size of the reconstructed 
signal. In all our experiments we have chosen this parameter as 1.5s where s is the actual support 
size (sparseness level) of x. 

The averaged performance over 50 Monte Carlo runs of the various methods in the small-scale 
scenario is depicted in Fig. Q] for different recovery indices ranging from 0.1 to 0.7. This figure shows 
both the mean ideal recovery error and the corresponding mean convergence time along with their 
standard deviations, which are illustrated using error bars. Observing the right panel in this figure 
reveals a performance hierarchy that places the CSP as the 2nd worst right after the LARS and just 
a bit before the BP. The remaining methods attain higher accuracy in this case. 




-1 



If k < 2000 



Otherwise 



(30) 



DRAFT 



14 

Nevertheless, the story completely changes when examining the LS-augmented methods. Here the 
Gauss-CSP attains the best accuracy over the entire range of sparseness degrees. The advantage of 
using the LS stage is likewise prominent in all other methods, as it significantly reduces the recovery 
error. By observing the left panel of Fig. [1] it can be easily recognized that the CSP is the 3rd slowest 
method in this case. The other computationally excessive methods here are the LARS and the BP. In 
this small-scale example, the convergence time of all methods excluding those of the greedy OMP 
and the GP roughly stays unchanged over the entire range of sparseness levels. The extreme slant of 
the OMP line is due to the nature of this algorithm, which tends to become computationally intensive 
as the assumed maximal support size increases. 

Note that the convergence times of the LS-augmented methods are omitted here and in the sequel, 
as they are nearly identical to those of the original unaugmented methods (the LS stage is imple- 
mented using the MAT LAB® pseudo-inverse command pinv, which is extremely fast even in high 
dimensions). 

The above insights are further emphasized in Table HI which repeats the values from Fig. Q] for two 
nominal recovery indices. The bold values in this table correspond to the averaged recovery errors of 
the LS-augmented methods. Thus, it can be clearly seen that the Gauss-CSP attains the best accuracy 
yielding a mean recovery error of around 2.05. 
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Fig. 1: The average performance (over 50 Monte Carlo runs) of the CS methods (including the 
computationally excessive ones, the LARS and the BP) for problem dimension 512x1024. The 
performance of the LS-augmented variants is depicted via solid lines. 



The performance of the algorithms in the medium and large scale scenarios is illustrated in Fig. |2] 
Here we have excluded the computationally intensive methods, the LARS and the BP, as their 
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OMP 
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2.50 
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1.03 (sec) 
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2.52 


2.37 
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GP 


3.95 


2.40 


4.58 




2.12 


0.20 (sec) 


0.48 (sec) 


CSP 


9.28 


2.05 


5.04 




2.07 


3.81 (sec) 


3.64 (sec) 



TABLE I: Recovery of sparse signals. The ideal recovery error (left columns) and convergence 
time (right columns) of the various methods for the problem dimension 512x1024. The bold values 
correspond to the accuracy of the two-staged LS-augmented variants. Averaged over 50 Monte Carlo 
runs. 

convergence time became prohibitively long. The performance of the remaining methods for the 
various problem dimensions is illustrated via two panels. Thus, the upper panel shows the convergence 
times for different sparseness levels, whereas the bottom panel shows the corresponding ideal recovery 
errors averaged over 50 Monte Carlo runs. As before, the standard deviations from the mean values 
are depicted using error bars. 

By observing the upper panel in this figure, it can be easily recognized that the CSP method is 
comparable in speed and even faster than the BCS over the entire range of sparseness levels in the 
medium scale scenarios. A similar conclusion applies when comparing the CSP with the OMP from 
a certain sparseness degree corresponding to a recovery index of around 0.4. The CSP turns out to be 
significantly faster than both the BCS and the OMP over almost the entire range of sparseness levels 
in the large scale scenario. 

The corresponding recovery errors of the various methods are presented in the bottom panel in this 
figure. Thus, it can be recognized that the unaugmented CSP yields a slightly worse accuracy than 
the other methods (notice the logarithmic ordinate). The 2nd worse method in terms of accuracy in 
this case is also the fastest of them all, the GP. Nevertheless, when looking at the augmented methods 
it turns out that, as we have already witnessed in the small-scale scenario, the Gauss-CSP is the best 
method in terms of accuracy. The 2nd best in this case is the LS-augmented GP. 

The above insights are supported by Table [TTJ which provides the timing and recovery error values 
in the large scale scenario for two nominal recovery indices. Thus, it can be easily seen that the Gauss- 
CSP outperforms the other methods in terms of accuracy and is also the 2nd fastest method (after the 
GP). Another interesting and important detail that stems from both Table [EI] and Fig. [2]is related to the 
fact that as opposed to the GP, of which the running time is highly sensitive to the sparseness degree, 
the CSP computation time remains almost unchanged with respect to this factor. This observation 
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could, in fact, be expected, as the CSP mechanism does not really distinguish between elements in 
the support and those which are not. This renders it highly robust and efficient when applied either 
to compressible problems or in such scenarios where the recovery index is relatively large. 
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Fig. 2: The average performance of the CS algorithms for various problem dimensions and sparseness 
degrees. Showing the convergence time (upper panel) and the ideal recovery error (bottom panel). 
The computationally excessive methods (LARS and BP) are not shown here. 



A comparison of the various CSP implementations that were discussed in Section ITV-B I is provided 
in Fig. [3] This figure demonstrates the convergence properties of the methods based on a single run 
for a problem dimension of 2048x4096. Thus, it can be easily recognized that the plain CSP attains 
the worst recovery error, as it begins to fluctuate around some nominal value. As was pointed out 
previously, this behavior indicates that the set of constraints in this case is inconsistent owing to an 
improper setting of the parameter e. This problem is alleviated in both of the variants, the CSP-SSP 
and the Gauss-CSP. Although both these methods outperform the plain CSP, it seems that the best 
attainable error is achieved by using the Gauss-CSP, whereas the fastest convergence is obtained by 
using the alternating CSP-LS scheme. 
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TABLE II: Recovery of sparse signals. The ideal recovery error (left columns) and convergence 
time (right columns) of the various methods for the problem dimension 3072x6144. The bold values 
correspond to the accuracy of the two-staged LS-augmented variants. Averaged over 50 Monte Carlo 
runs. 
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Fig. 3: The ideal recovery error of the CSP variants. Single run, problem dimension 2048x4096. 



C. Realistic (Compressible) Examples 

Compressible signals are of greater practical importance when it comes to real-world applications. 
As was previously pointed out, such signals are nearly sparse in the sense that they consist of many 
relatively small elements that are not identically zero. Following this, we consider here two realistic 
experimental studies involving compressible signals. The first example, which is conducted in the 
spirit of the previous synthetic one, consists of constructing the discrete Fourier transform (DFT) of an 
undersampled time series. The second experiment, which follows right after, involves the reconstruction 
of the famous Shepp-Logan phantom head image that is commonly used for assessing the performance 
of recovery schemes in tomography. 
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1) Constructing a D FT from Undersampled Data: In this example we consider a discrete signal 
in the time domain, which takes the form 

n i 

Vk = ^sin(cj^ fc ) (31) 
i=l 

where the frequencies Wj, i = 1, . . . , rif are uniformly sampled over [1, 107r]. Let x 6 R n be the DFT 
of yk over the discrete times k = 1, . . . , n, that is 

n 

x k = l/y/n^yj exp (-2n(j - l)(k - l)i/n) (32) 
j'=i 

which can be written compactly as x = Fy with F being the unitary DFT matrix. Now, suppose that 
we wish to reconstruct x from a randomly sampled set of points {yj}j^ =1 for which m << n. In other 
words, we attempt to compose the DFT x from an undersampled time series. This is a classical CS 
problem, for which the sensing matrix H is given as a partial inverse DFT matrix consisting of only 
m randomly-picked rows. Thus, 

y = Hx (33) 

where H is composed of m randomly picked rows from F* , corresponding to the time points in y. 
The vector x itself is expected to be compressible, essentially comprised of decaying coefficients in 
the vicinity of those which are associated with the underlying frequencies oji, i = 1, . . . , n/. 

Similarly to the synthetic case, we apply the various CS methods (excluding the computationally 
excessive ones) for different problem dimensions and sparseness levels. As distinct from the previous 
example, here the recovery index (|27T ) is computed based on the effective sparseness measure ([8]) with 
e = 0.05 maxj \xi\. The CS algorithms' tuning parameters are chosen to maximize accuracy based on 
tuning runs. The CSP relaxation variables and termination conditions remain unchanged. 

The averaged performance of the various methods over 50 Monte Carlo runs (in which a new set 
of frequencies {wi}™^ is sampled at the beginning of each run) is depicted in Fig. |4] The upper 
panel in this figure shows the CS methods' mean convergence times and their associated standard 
deviations (error bars) for different problem dimensions (ranging from 512x1024 through 1024x2048 
to 2048x4096) and effective sparseness levels (corresponding to recovery indices of between 0.1 to 
0.8). The corresponding recovery errors of the various methods are provided in the bottom panel of 
this figure. 

By observing both these panels, it can be clearly recognized that the CSP method maintains the 
best tradeoff between accuracy and computational load as the problem becomes more complex (as 
indicated by both its dimensionality and sparseness degree). In virtue of its underlying mechanism, 
the CSP exhibits robustness with respect to both these factors as it attains recovery errors that are 
comparable to those of the BCS and the OMP at a nearly fixed computational cost. This renders it 
the fastest reliable method as the problem dimension increases. Although the GP is the fastest scheme 
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in this case, its accuracy is extremely low with recovery errors of nearly the magnitude of the signal 
itself. The LS-augmented GP does posses a clear advantage over the unaugmented one; however, 
its performance is still unsatisfactory (with recovery errors of almost twice than those of the other 
methods). 
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Fig. 4: Recovering a DFT from undersampled data. Showing the average performance of the CS 
algorithms for various problem dimensions and effective sparseness degrees. The convergence time 
and the normalized error are depicted in the upper and lower panels, respectively. 



The above insights are further illustrated in both Tables |TTT] and [TV] in which the timing and recovery 
error values are repeated for the small and large scale scenarios, respectively. Thus, it can be seen 
from the first table (small scale) that the CSP attains a recovery error that is similar to those of the 
BCS and the OMP. Its convergence time, however, is longer compared to the other methods. The 
prominent advantage of the CSP is manifested in the 2nd table (large scale) in which it attains the 
best performance in terms of accuracy and convergence time for a relatively large recovery index 
of 0.8. Its robustness to the sparseness level is illustrated in both these tables by the nearly fixed 
convergence time for the two extremal values of the recovery index. 

Figure [5] depicts the performance of the CSP in recovering typical sparse and compressible signals 
of nearly the same recovery index. This figure suggests that, in practice, though the recovery indices 
are nearly the same, it might be more difficult to reconstruct a compressible representation rather than 
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TABLE III: Recovering a DFT from undersampled data. The normalized recovery error (left columns) 
and convergence time (right columns) of the various methods for the problem dimension 512x1024. 
The bold values correspond to the accuracy of the two-staged LS-augmented variants. Averaged over 
50 Monte Carlo runs. 



Method 



OMP 
BCS 
GP 
CSP 



(s/m) logn = 0.2 



0.10 
0.09 
0.40 
0.09 



0.10 
0.10 
0.19 
0.10 



(s/m) logn = 0.8 



0.17 
0.20 
0.66 
0.18 



0.17 
0.18 
0.31 
0.17 



(s/m) log n = 0.2 



7.20 (sec) 
15.40 (sec) 
0.47 (sec) 
12.87 (sec) 



(s/m) log n — 0.8 



226.80 (sec) 
50.62 (sec) 
0.53 (sec) 
13.11 (sec) 



TABLE IV: Recovering a DFT from undersampled data. The normalized recovery error (left columns) 
and convergence time (right columns) of the various methods for the problem dimension 2048x4096. 
The bold values correspond to the accuracy of the two-staged LS-augmented variants. Averaged over 
50 Monte Carlo runs. 



a sparse one. This follows from the fact that the compressible estimate on the right sub-figure is less 
accurate than its companion on the left. 

2) Image Recovery from Undersampled Radial Fourier Coefficients: In the last part of this section, 
we demonstrate the performance of some of the methods in recovering an image using undersampled 
2D Fourier coefficients that are computed along radial lines. The example considered here follows 
along the lines of (H, where the Shepp-Logan phantom head image is used. In our experiment, 
however, we use a low-dimensional 128x128 version of this image, which yields a signal of dimension 
128 2 = 16384. We examine the CS methods for two scenarios in which the 2D Fourier coefficients 
are sampled along either 32 or 64 radial lines (corresponding to nearly 25% or 50% of the available 
data). 

Three methods are applied for recovering the phantom head image: BCS, GP and CSP (unaugmented 
versions). However, the performance of only two of them, the GP and the CSP are shown in Fig. [6] 
as the BCS exhibited poor performance, essentially yielding recovery errors of nearly the signal 
magnitude (specifically, around 0.9). Figure [6] clearly shows the superiority of the CSP over the GP 
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Fig. 5: Illustration of the recovery performance of the CSP method for sparse and compressible signals. 



in both scenarios. 

VI. Conclusions 

A novel compressed sensing (CS) scheme was introduced for solving large-scale compressible 
problems. The new method utilizes the instantaneous subgradient for projecting the previous iterate 
on some intermediate point, thereby approaching the underlying convex feasibility set resulting from 
a group of possibly nonlinear constraints. The cyclic subgradient projection (CSP) mechanism, which 
is the heart of the new approach, facilitates the efficient solution of large-scale CS problems owing to 
its implementation, which involves vector products only. An extensive numerical comparison of the 
CSP-CS algorithm and its variants with some of the state-of-the-art CS schemes clearly demonstrated 
its superiority in high-dimensional compressible settings. 

In particular, we may draw the following conclusions. First, the newly-proposed method maintains 
a nearly fixed computational cost, irrespectively to the problem dimension and sparseness level; more- 
over, the new method easily copes with realistic scenarios involving high dimensional compressible 
signals. 

Second, a de-biased CSP, which utilizes an additional least-squares stage, improves the performance 
for sparse signals, showing superiority compared to other de-biased methods. 

Finally, the extensive numerical comparison of all the methods clearly shows that the CSP maintains 
the best tradeoff between computational efficiency and accuracy as the problem dimension and sparse- 
ness level increase. As such, we conclude that the CSP is the most efficient method for large-scale 
compressible problems. 
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(a) Original (b) Normalized error 0.12 (c) Normalized error 0.22 

(d) Original (e) Normalized error 0.19 (f) Normalized error 0.28 

Fig. 6: The original and reconstructed 128x128 Shepp-Logan phantom head. Showing the CSP (upper 
panel) and the GP (lower panel) methods. The middle and right rows depict the recovery error based 
on 64 and 32 projections, respectively (equivalent to 50% and 25% undersampled data). 
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Appendix 

The MATLAB® files implementing our new algorithms are available at 
http : / /www . technion .ac.il/ pgurf il/csp-cs/. All the other source files for the vari- 
ous CS algorithms implemented herein can be found at the following locations: 

• Least Angle Regression (LARS) : 

http : / /www . mathworks . com/mat labcentral/f ileexchange/2 31 8 6-lars-algorithm 
. Basis Pursuit (BP): 

http : / /www . a cm . caltech . edu/llmagic/ 
. Bayesian CS (BCS): 

http : //people . ee . duke . edu/ lihan/cs/ 
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Gradient Projection for Sparse Reconstruction (GPSR): 

http://www.lx.it.pt/ mtf/GPSR/ 
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Figure 1. The average performance (over 50 Monte Carlo runs) of the CS methods (including 
the computationally excessive ones, the LARS and the BP) for problem dimension 512x1024. 
The performance of the LS augmented versions of the algorithms is depicted via solid lines. 
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Figure 2. Illustration of the recovery performance of the CSP method for sparse and com- 
pressible signals. 
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Figure 3. The average performance of the CS algorithms for various problem dimensions and 
sparseness degrees. Showing the convergence time (upper panel) and the ideal normalized 
error (bottom panel). The computationally excessive methods (LARS and BP) are not shown 
here. 
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Figure 4. Recovering a DFT from undersampled data. Showing the average performance of the 
CS algorithms for various problem dimensions and effective sparseness degrees. The conver- 
gence time and the normalized error are depicted in the upper and lower panels, respectively. 
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Figure 5. The ideal normalized error of the CSP variants . Single run, problem dimension 
2048x4096. 









(a) Original 



(b) Normalized error 0.12 



(c) Normalized error 0.22 






(d) Original 
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Figure 6. The original and reconstructed 128x128 Shepp-Logan phantom head. Showing the 
CSP (upper panel) and the GP (lower panel) methods. The middle and right rows depict 
the recovery error based on 64 and 32 projections, respectively (equivalent to 50% and 25% 
undersampled data). 
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